@fcorowe)Three main structures are generally used to organise geographic data:
Vector data structure: The vector data structures record geographic information using points, lines and polygons in a geographic table. These tables contain information about geographic objects. Columns store information about geographic objects, attributes or features, and rows represent individual geographic objects.
Raster data structures: The raster data structures record geographic data in an uniform way over a space in the form of grids. It divides geographic surfaces up into cells of constant size. Rows and columns provide information about the geographic location of a grid.
Spatial graphs: Spatial graphs store connections between objects through space. These connections may derive from geographical topology (e.g. contiguity), distance, or more sophisticated dimensions, such as interaction flows (e.g. human mobility, trade and information).
Vector data structures tend to dominate the social sciences are the interest is often in capturing discrete geographic units containing populations. Here therefore we focus on vector data structures.
To understand the structure of vector data, let’s read a dataset
(Liverpool_OA.shp) describing output areas within Liverpool
in the United Kingdom. To read in the data, we use the
st_read() from the package sf. sf
supports geometry collections, which can contain multiple geometry types
in a single object. sf provides the same functionality
previously provided in three separate packages sp,
rgdal and rgeos (Robin et al. 2021).
For raster data, I
would recommend using the package terra.
oa_shp <- st_read("./data/Liverpool_OA.shp")
## Reading layer `Liverpool_OA' from data source
## `/Users/franciscorowe/Dropbox/Francisco/Research/github_projects/courses/intro-gds/data/Liverpool_OA.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1584 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 332390.2 ymin: 379748.5 xmax: 345636 ymax: 397980.1
## Projected CRS: Transverse_Mercator
We read a sf data frame containing spatial and attribute
columns. We can examine the content of the data frame by using the
function head(). We called the first four columns. The last
column in this example contains the geographic information
i.e. geometry.
class(oa_shp)
## [1] "sf" "data.frame"
head(oa_shp[,1:4])
## Simple feature collection with 6 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 335071.6 ymin: 389876.7 xmax: 339426.9 ymax: 394479
## Projected CRS: Transverse_Mercator
## OA_CD LSOA_CD MSOA_CD LAD_CD geometry
## 1 E00176737 E01033761 E02006932 E08000012 MULTIPOLYGON (((335106.3 38...
## 2 E00033515 E01006614 E02001358 E08000012 MULTIPOLYGON (((335810.5 39...
## 3 E00033141 E01006546 E02001365 E08000012 MULTIPOLYGON (((336738 3931...
## 4 E00176757 E01006646 E02001369 E08000012 MULTIPOLYGON (((335914.5 39...
## 5 E00034050 E01006712 E02001375 E08000012 MULTIPOLYGON (((339325 3914...
## 6 E00034280 E01006761 E02001366 E08000012 MULTIPOLYGON (((338198.1 39...
Each row represents an output area. Each output area has multiple attributes (i.e. columns): administrative areas codes and geometry, as well as information on the local population in these areas; however, this information is not displayed above (can you access it?).
The content of the geometry column gives sf objects
their spatial powers. oa_shp$geometry is a ‘list column’
that contains all the coordinates of the output areas polygons.
sf objects can be plotted quickly with the base R function
plot().
plot(oa_shp$geometry)
Attributes:
Challenges:
Rowe, F. 2021. Big Data and Human Geography. In: Demeritt, D. and Lees L. (eds) ConciseEncyclopedia of Human Geography. Edward Elgar Encyclopedias in the Social Sciences series.
Rowe, F. Arribas-Bel, D. 2021. Spatial Modelling for Data Scientists.
Different classifications of spatial data types exist. Knowing the structure of the data at hand is important to think of appropriate analytical methods.